REAL-TIME COVID-19: DATA ANALYSIS & VISUALIZATION

Bernardo Carraro Detanico

01 April 2020

Update log:

  • 22 May 2020: Dataset updated.
  • 23 May 2020: New section added - COVID-19 in United Kingdom Analysis.

The outbreak of the new coronavirus disease (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has quickly become a global health emergency.

  • On 31 December 2019, the WHO (World Health Organization) China Country Office was informed of cases of unknown aetiology pneumonia (unknown cause) detected in Wuhan City, Hubei Province of China.
  • The Chinese authorities identified a new type of coronavirus, which was isolated on 7 January 2020.
  • On 30 January the WHO declared that the outbreak of 2019-nCoV constitutes a Public Health Emergency of International Concern.
  • All coronaviruses that have caused diseases to humans have had animal origins, generally in bats, rodents, civet cats and dromedary camels. WHO informs that the COVID-19 most probably has its ecological reservoir in bats, and the transmission to humans has likely occurred through an intermediate animal host – a domestic animal, a wild animal or a domesticated wild animal which has not yet been identified.

    The current project collects the coronavirus disease (COVID-19) data from Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) repository (https://github.com/CSSEGISandData/COVID-19) and from the Government of the United Kingdom - GOV.UK (https://coronavirus.data.gov.uk). The repositories are updated daily and contains time series data on confirmed cases and deaths.

    In [1]:
    #Libraries:
    import pandas as pd
    import numpy as np
    import datetime as dt
    from datetime import datetime
    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    
    In [2]:
    # Importing data from Github:
    url1 = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
    df_conf = pd.read_csv(url1)
    
    url2 = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
    df_deaths = pd.read_csv(url2)
    
    In [3]:
    #EDA
    df_conf.dtypes
    df_deaths.dtypes
    
    # Finding the null values
    df_conf.isnull().sum().sort_values(ascending=False)
    df_deaths.isnull().sum().sort_values(ascending=False)
    
    #Deleting the 'Province/State' Column
    df_conf = df_conf.drop('Province/State', 1)
    df_deaths = df_deaths.drop('Province/State', 1)
    
    # Dropping column with NaN
    df_conf = df_conf.dropna(axis = 1)
    df_deaths = df_deaths.dropna(axis = 1)
    
    # Group by 'Country/Region' column:
    df_conf_grouped = df_conf.groupby('Country/Region').sum()
    df_deaths_grouped = df_deaths.groupby('Country/Region').sum()
    
    # Change the display precision option in Pandas
    pd.set_option('precision', 0)
    

    1. Geographic distribution of confirmed cases worldwide

    In [4]:
    #Confirmed cases over the days
    df_conf_melt = df_conf.melt(['Country/Region', 'Lat', 'Long'], var_name='Date', value_name='Confirmed')
    df_conf_melt['Date'] = pd.to_datetime(df_conf_melt['Date']).dt.date
    df_conf_melt = df_conf_melt.sort_values(by=['Date'])
    df_conf_melt['Date'] = df_conf_melt['Date'].astype(str)
    df_conf_melt['Confirmed'] = df_conf_melt['Confirmed'].replace(np.nan, 0)
    
    fig = px.scatter_geo(df_conf_melt, lat="Lat", lon="Long", color="Country/Region",
                         hover_name="Country/Region", size="Confirmed", projection="natural earth")
    
    fig.update_layout(showlegend=False, margin=dict(l=20, r=20, t=20, b=20))
    fig.show()
    

    2. Total number of confirmed cases and deaths over time

    In [5]:
    df_conf_total = pd.DataFrame(df_conf.sum(axis=0).drop(index=[
        'Country/Region','Lat','Long'])).reset_index().rename(columns={'index': 'Date', 0: 'Total Confirmed'})
    df_deaths_total = pd.DataFrame(df_deaths.sum(axis=0).drop(index=[
        'Country/Region','Lat','Long'])).reset_index().rename(columns={'index': 'Date', 0: 'Total Deaths'})
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=df_conf_total['Date'], y=df_conf_total['Total Confirmed'],
                             mode='lines+markers', line_shape='spline', name="confirmed cases", fill='tozeroy'))
    fig.add_trace(go.Scatter(x=df_deaths_total['Date'], y=df_deaths_total['Total Deaths'],
                             mode='lines+markers', line_shape='spline', name="deaths", fill='tozeroy'))
    fig.add_annotation(
                x=df_conf_total.iloc[-1,0],
                y=df_conf_total.iloc[-1,1],
                text=df_conf_total.iloc[-1,1])
    fig.add_annotation(
                x=df_deaths_total.iloc[-1,0],
                y=df_deaths_total.iloc[-1,1],
                text=df_deaths_total.iloc[-1,1])
    fig.update_layout(legend=dict(
            x=0,
            y=1.0,
            bgcolor='rgba(255, 255, 255, 0)',
            bordercolor='rgba(255, 255, 255, 0)'
            ), yaxis_title="Number",
                     margin=dict(l=20, r=20, t=20, b=20))
    fig.show()
    

    3. Fatality rate

    The fatality rate is defined as number of deaths in people who tested positive for SARS-CoV-2 divided by number of SARS-CoV-2 cases. The fatality rate typically is used as a measure of disease severity and is related to patients' characteristics, health care system features, and cultural and socioeconomic factors.

    In [6]:
    # Fatality rate
    df_deaths_melt = df_deaths.melt(['Country/Region', 'Lat', 'Long'], var_name='Date', value_name='Deaths')
    df_deaths_melt = df_deaths_melt.drop(['Country/Region', 'Lat', 'Long'], 1)
    df_deaths_melt_g = df_deaths_melt.groupby('Date').sum()
    df_deaths_melt_g.reset_index(level=0, inplace=True)
    df_deaths_melt_g['Date'] = pd.to_datetime(df_deaths_melt_g['Date'])
    df_deaths_melt_g = df_deaths_melt_g.sort_values(by=['Date'])
    df_deaths_melt_g['deaths per day'] = df_deaths_melt_g['Deaths'].diff().fillna(17)
    
    df_conf_melt = df_conf.melt(['Country/Region', 'Lat', 'Long'], var_name='Date', value_name='Confirmed')
    df_conf_melt = df_conf_melt.drop(['Country/Region', 'Lat', 'Long'], 1)
    df_conf_melt_g = df_conf_melt.groupby('Date').sum()
    df_conf_melt_g.reset_index(level=0, inplace=True)
    df_conf_melt_g['Date'] = pd.to_datetime(df_conf_melt_g['Date'])
    df_conf_melt_g = df_conf_melt_g.sort_values(by=['Date'])
    df_conf_melt_g.rename(columns = {'Date':'Date1'}, inplace = True)
    df_conf_melt_g['confirmed per day'] = df_conf_melt_g['Confirmed'].diff().fillna(555)
    
    join_melts = pd.concat([df_conf_melt_g, df_deaths_melt_g], axis=1).fillna(0)
    join_melts['Fatality rate (%)'] = (join_melts['Deaths'] / join_melts['Confirmed'])*100
    join_melts['Date'] = join_melts['Date'].dt.strftime('%m/%d/%y')
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=join_melts['Date'], y=join_melts['Fatality rate (%)'],
                             mode='lines+markers', line_shape='spline', name="Fatality rate (%)", line=dict(color='#00cc96')))
    fig.add_annotation(
                x=join_melts.iloc[-1,-4],
                y=join_melts.iloc[-1,-1],
                text=round(join_melts.iloc[-1,-1],2))
    fig.update_layout(yaxis_title="Fatality rate (%)", margin=dict(l=20, r=20, t=20, b=20))
    fig.show()
    

    4. Total number of confirmed cases, deaths and fatality rate for the 15 most affected countries

    In [7]:
    df_conf_grouped = df_conf.groupby('Country/Region').sum()
    df_conf_grouped['Total Confirmed'] = df_conf_grouped.iloc[:,-1]
    
    df_deaths_grouped = df_deaths.groupby('Country/Region').sum()
    df_deaths_grouped['Total Deaths'] = df_deaths_grouped.iloc[:,-1]
    
    df_merge = df_conf_grouped.merge(df_deaths_grouped, how='inner', on='Country/Region')
    df_merge.reset_index(level=0, inplace=True)
    df_merge.sort_values(by=['Total Deaths'], inplace=True, ascending=False)
    df_merge = df_merge.head(15)
    
    fig = make_subplots(rows=1, cols=3, subplot_titles=("Confirmed cases (nº)", "Deaths (nº)", "Fatality rate (%)"))
    fig.add_trace(
        go.Bar(x=df_merge['Country/Region'], y=df_merge['Total Confirmed']),
        row=1, col=1)
    fig.add_trace(
        go.Bar(x=df_merge['Country/Region'], y=df_merge['Total Deaths']),
        row=1, col=2)
    fig.add_trace(
        go.Bar(x=df_merge['Country/Region'], y=(df_merge['Total Deaths'] / df_merge['Total Confirmed'])*100),
        row=1, col=3)
    fig.update_layout(showlegend=False, margin=dict(l=20, r=20, t=20, b=20))
    fig.show()
    

    5. Daily new cases and deaths

    In [8]:
    df_conf_deaths = join_melts.drop(['Confirmed', 'Date1', 'Deaths', 'Fatality rate (%)'], 1)
    df_conf_deaths_m = df_conf_deaths.melt('Date', var_name='cols',  value_name='vals')
    
    # Bar Plot - Cases and deaths per day
    fig = px.bar(df_conf_deaths_m, x="Date", y="vals", color='cols', barmode='group',
                 height=400)
    fig.update_layout(legend=dict(
            x=0,
            y=1.0,
            bgcolor='rgba(255, 255, 255, 0)',
            bordercolor='rgba(255, 255, 255, 0)'
            ), barmode='group', xaxis_tickangle=-60, bargap=0.0,
            bargroupgap=0.0, yaxis_title="Number", margin=dict(l=20, r=20, t=20, b=20), legend_title='')
    fig.show()
    

    6. Daily new cases in the last 10 days for the 14 most affected countries + China

    In [9]:
    df_conf_grouped = df_conf.groupby('Country/Region').sum()
    df_conf_grouped.drop(df_conf_grouped.iloc[:, 0:2], inplace = True, axis = 1) 
    df_conf_grouped = df_conf_grouped.diff(axis=1)
    df_conf_grouped.reset_index(level=0, inplace=True)
    df_conf_grouped.drop(df_conf_grouped.iloc[:, 1:-10], inplace = True, axis = 1)
    df_conf_grouped = df_conf_grouped.sort_values(by=df_conf_grouped.columns[-1], ascending=False)
    df_conf_grouped['Total Cases'] = df_conf_grouped.sum(axis=1)
    df_china = df_conf_grouped[df_conf_grouped['Country/Region']=='China']
    df_conf_grouped = df_conf_grouped.head(14)
    df_total = pd.concat([df_china, df_conf_grouped], ignore_index=True)
    cols = df_total.columns[1:-1]
    df_total.style.background_gradient(subset=cols, cmap='Blues', axis=1)
    
    Out[9]:
    Country/Region 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 Total Cases
    0 China 6 5 9 6 10 9 0 0 0 18 63
    1 US 21030 27368 25050 24996 18937 21551 20260 23285 25294 23790 231561
    2 Brazil 11923 13028 17126 13220 7569 14288 16517 19694 18508 20803 152676
    3 Russia 10028 9974 10598 9200 9709 8926 9263 8764 8849 8894 94205
    4 India 3763 3942 3787 4864 5050 4630 6147 5553 6198 6568 50502
    5 Chile 2660 2659 2502 1886 2353 2278 3520 4038 3964 4276 30136
    6 United Kingdom 3244 3455 3564 3457 3534 2714 2429 -519 2627 3298 27803
    7 Mexico 1862 2409 2437 2112 2075 2414 2713 2248 2973 2960 24203
    8 Peru 4247 4298 3891 4046 3732 2660 4550 4537 4749 2929 39639
    9 Saudi Arabia 1905 2039 2307 2840 2736 2593 2509 2691 2532 2642 24794
    10 Pakistan 962 490 3011 0 1352 1974 1841 1932 2193 2603 16358
    11 Iran 1958 1808 2102 1757 1806 2294 2111 2346 2392 2311 20885
    12 Qatar 1390 1733 1153 1547 1632 1365 1637 1491 1554 1830 15332
    13 Spain 661 849 643 515 0 908 431 518 482 1787 6794
    14 Bangladesh 1162 1041 1202 930 1273 1602 1251 1617 1773 1694 13545

    7. Daily new cases for the 14 most affected countries + China

    In [10]:
    df_conf_grouped = df_conf.groupby('Country/Region').sum()
    df_conf_grouped.drop(df_conf_grouped.iloc[:, 0:2], inplace = True, axis = 1) 
    df_conf_grouped = df_conf_grouped.diff(axis=1)
    df_conf_grouped.reset_index(level=0, inplace=True)
    df_conf_grouped.drop(df_conf_grouped.iloc[:, 1:1], inplace = True, axis = 1)
    df_conf_grouped = df_conf_grouped.sort_values(by=df_conf_grouped.columns[-1], ascending=False)
    df_conf_grouped.set_index('Country/Region', inplace = True)
    df_conf_grouped = df_conf_grouped.fillna(0)
    df_china = df_conf_grouped[df_conf_grouped.index=='China']
    df_conf_grouped = df_conf_grouped.head(14)
    df_total = pd.concat([df_china, df_conf_grouped], ignore_index=False)
    df_conf_t = df_total.T
    
    plot_rows=5
    plot_cols=3
    z = plot_rows * plot_cols
    fig = make_subplots(rows=plot_rows, cols=plot_cols, subplot_titles=(df_conf_t.columns[0:z]), shared_xaxes=True, 
                       vertical_spacing = 0.05)
    x = 0
    for i in range(1, plot_rows + 1):
        for j in range(1, plot_cols + 1):
            #print(str(i)+ ', ' + str(j))
            fig.add_trace(go.Bar(name = df_conf_t.columns[x], x=df_conf_t.index, y=df_conf_t[df_conf_t.columns[x]].values, 
                                     ), 
                         row=i,
                         col=j)
    
            x=x+1
    fig.update_layout(showlegend=False, height=900, width=980, margin=dict(l=20, r=20, t=20, b=20))
    fig.show()
    

    8. Daily deaths in the last 10 days for the 14 most affected countries + China

    In [11]:
    # Deaths in the last 7 days by country (top 15) - Table
    df_deaths_grouped = df_deaths.groupby('Country/Region').sum()
    df_deaths_grouped.drop(df_deaths_grouped.iloc[:, 0:2], inplace = True, axis = 1) 
    df_deaths_grouped = df_deaths_grouped.diff(axis=1)
    df_deaths_grouped.reset_index(level=0, inplace=True)
    df_deaths_grouped.drop(df_deaths_grouped.iloc[:, 1:-10,], inplace = True, axis = 1)
    df_deaths_grouped = df_deaths_grouped.sort_values(by=df_deaths_grouped.columns[-1], ascending=False)
    df_deaths_grouped['Total Deaths'] = df_deaths_grouped.sum(axis=1)
    df_china = df_deaths_grouped[df_deaths_grouped['Country/Region']=='China']
    df_deaths_grouped = df_deaths_grouped.head(14)
    df_total = pd.concat([df_china, df_deaths_grouped], ignore_index=True)
    cols = df_deaths_grouped.columns[1:-1]
    df_total.style.background_gradient(subset=cols, cmap='Reds', axis=1)
    
    Out[11]:
    Country/Region 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 Total Deaths
    0 China 0 0 0 1 0 0 0 0 0 0 1
    1 US 1763 1779 1632 1224 808 785 1574 1518 1263 1277 13623
    2 Brazil 779 759 963 700 456 735 1130 876 1188 1001 8587
    3 Spain 184 217 138 104 0 146 69 110 52 688 1708
    4 Mexico 294 257 290 278 132 155 334 424 420 479 3063
    5 United Kingdom 495 429 385 468 170 160 546 364 338 351 3706
    6 Russia 96 93 113 119 94 91 115 135 127 150 1133
    7 India 136 98 104 118 154 131 146 132 150 142 1311
    8 Italy 195 262 242 153 145 99 162 161 156 130 1705
    9 Ecuador 7 4 256 94 48 63 40 49 51 117 729
    10 Peru 112 98 125 131 125 141 125 110 124 96 1187
    11 Canada 125 167 87 121 103 57 68 122 117 93 1060
    12 Sweden 147 69 117 28 5 19 45 88 40 54 612
    13 Iran 50 71 48 35 51 69 62 64 66 51 567
    14 Pakistan 24 9 64 0 39 30 36 46 32 50 330

    9. Daily deaths for the 14 most affected countries + China

    In [12]:
    df_deaths_grouped = df_deaths.groupby('Country/Region').sum()
    df_deaths_grouped.drop(df_deaths_grouped.iloc[:, 0:2], inplace = True, axis = 1) 
    df_deaths_grouped = df_deaths_grouped.diff(axis=1)
    df_deaths_grouped.reset_index(level=0, inplace=True)
    df_deaths_grouped.drop(df_deaths_grouped.iloc[:, 1:1], inplace = True, axis = 1)
    df_deaths_grouped = df_deaths_grouped.sort_values(by=df_deaths_grouped.columns[-1], ascending=False)
    df_deaths_grouped.set_index('Country/Region', inplace = True)
    df_deaths_grouped = df_deaths_grouped.fillna(0)
    df_china = df_deaths_grouped[df_deaths_grouped.index=='China']
    df_deaths_grouped = df_deaths_grouped.head(14)
    df_total = pd.concat([df_china, df_deaths_grouped], ignore_index=False)
    df_deaths_t = df_total.T
    
    plot_rows=5
    plot_cols=3
    z = plot_rows * plot_cols
    fig = make_subplots(rows=plot_rows, cols=plot_cols, subplot_titles=(df_deaths_t.columns[0:z]), shared_xaxes=True, 
                       vertical_spacing = 0.05)
    x = 0
    for i in range(1, plot_rows + 1):
        for j in range(1, plot_cols + 1):
            #print(str(i)+ ', ' + str(j))
            fig.add_trace(go.Bar(name = df_deaths_t.columns[x], x=df_deaths_t.index, y=df_deaths_t[df_deaths_t.columns[x]].values, 
                                     ), 
                         row=i,
                         col=j)
    
            x=x+1
    fig.update_layout(showlegend=False, height=900, width=980, margin=dict(l=20, r=20, t=20, b=20))
    fig.show()
    

    10. COVID-19 in United Kingdom Analysis

    In [13]:
    # Importing data:
    url1 = 'coronavirus-cases_latest.csv'
    df_cases = pd.read_csv(url1)
    
    url2 = 'coronavirus-deaths_latest.csv'
    df_deaths = pd.read_csv(url2)
    
    In [14]:
    df_cases.head(5)
    
    Out[14]:
    Area name Area code Area type Specimen date Daily lab-confirmed cases Previously reported daily cases Change in daily cases Cumulative lab-confirmed cases Previously reported cumulative cases Change in cumulative cases Cumulative lab-confirmed cases rate
    0 England E92000001 Nation 2020-05-21 38 38 147745 146662 1083 264
    1 South West E12000009 Region 2020-05-21 3 NaN NaN 7476 NaN NaN 134
    2 South East E12000008 Region 2020-05-21 3 NaN NaN 20816 NaN NaN 228
    3 London E12000007 Region 2020-05-21 1 NaN NaN 26683 NaN NaN 300
    4 East of England E12000006 Region 2020-05-21 4 NaN NaN 13558 NaN NaN 219
    In [15]:
    df_deaths.head(5)
    
    Out[15]:
    Area name Area code Area type Reporting date Daily change in deaths Cumulative deaths
    0 Wales W92000004 Nation 2020-05-22 7 1254
    1 Scotland S92000003 Nation 2020-05-22 37 2221
    2 Northern Ireland N92000002 Nation 2020-05-22 7 501
    3 United Kingdom K02000001 UK 2020-05-22 351 36393
    4 England E92000001 Nation 2020-05-22 300 32417
    In [16]:
    #Organizing and cleaning the data
    df_cases = df_cases[(df_cases["Area type"]=="Nation") | (df_cases["Area type"]=="Region")]
    
    cols = [5,6,8,9,10]
    df_cases.drop(df_cases.columns[cols],axis=1,inplace=True)
    
    df_cases.columns = ['Area name',
                        'Area code',
                        'Area type',
                        'Reporting date',
                        'Daily change in confirmed cases',
                        'Cumulative confirmed cases']
    
    df_cases.head(5)
    
    Out[16]:
    Area name Area code Area type Reporting date Daily change in confirmed cases Cumulative confirmed cases
    0 England E92000001 Nation 2020-05-21 38 147745
    1 South West E12000009 Region 2020-05-21 3 7476
    2 South East E12000008 Region 2020-05-21 3 20816
    3 London E12000007 Region 2020-05-21 1 26683
    4 East of England E12000006 Region 2020-05-21 4 13558
    In [17]:
    #EDA
    df_cases.dtypes
    df_deaths.dtypes
    
    # Converting to datetime type
    df_deaths["Reporting date"] = pd.to_datetime(df_deaths["Reporting date"])
    df_cases["Reporting date"] = pd.to_datetime(df_cases["Reporting date"])
    
    #Identifyng weekdays and week of year
    df_cases["Weekday"] = df_cases['Reporting date'].dt.day_name()
    df_deaths["Weekday"] = df_deaths['Reporting date'].dt.day_name()
    df_cases["Week_of_year"] = df_cases['Reporting date'].dt.week
    df_deaths["Week_of_year"] = df_deaths['Reporting date'].dt.week
    
    In [18]:
    df_cases.head()
    
    Out[18]:
    Area name Area code Area type Reporting date Daily change in confirmed cases Cumulative confirmed cases Weekday Week_of_year
    0 England E92000001 Nation 2020-05-21 38 147745 Thursday 21
    1 South West E12000009 Region 2020-05-21 3 7476 Thursday 21
    2 South East E12000008 Region 2020-05-21 3 20816 Thursday 21
    3 London E12000007 Region 2020-05-21 1 26683 Thursday 21
    4 East of England E12000006 Region 2020-05-21 4 13558 Thursday 21
    In [19]:
    df_deaths.head()
    
    Out[19]:
    Area name Area code Area type Reporting date Daily change in deaths Cumulative deaths Weekday Week_of_year
    0 Wales W92000004 Nation 2020-05-22 7 1254 Friday 21
    1 Scotland S92000003 Nation 2020-05-22 37 2221 Friday 21
    2 Northern Ireland N92000002 Nation 2020-05-22 7 501 Friday 21
    3 United Kingdom K02000001 UK 2020-05-22 351 36393 Friday 21
    4 England E92000001 Nation 2020-05-22 300 32417 Friday 21

    10.1. Total number of deaths in UK

    In [20]:
    # Cumulative Deaths Plot
    fig = px.line(df_deaths, x="Reporting date", y="Cumulative deaths", color='Area name', line_shape='spline')
    
    fig.update_xaxes(
        #ticktext=list(range(1, len(df_deaths['Reporting date'].unique()))),
        #tickvals=df_deaths["Reporting date"],
        nticks=40,
        rangeslider_visible=True,
        rangeselector=dict(
            buttons=list([
                dict(count=7, label="1w", step="day", stepmode="backward"),
                dict(count=1, label="1m", step="month", stepmode="backward"),
                dict(count=3, label="3m", step="month", stepmode="todate"),
                dict(step="all")
            ])
        )
    )
    
    fig.update_layout(
        title={
            'text': "Cumulative Deaths",
            'y':0.95,
            'x':0.45,
            'xanchor': 'center',
            'yanchor': 'top'},
        xaxis_tickformat = '%d/%m'
    )
    
    
    
    #fig.update_traces(mode='markers+lines')
    fig.show()
    

    10.2. Daily deaths in UK

    In [21]:
    t_uk = df_deaths[df_deaths['Area name']=="United Kingdom"]
    t_eng = df_deaths[df_deaths['Area name']=="England"]
    t_sco = df_deaths[df_deaths['Area name']=="Scotland"]
    t_wal = df_deaths[df_deaths['Area name']=="Wales"]
    t_noi = df_deaths[df_deaths['Area name']=="Northern Ireland"]
    
    fig = make_subplots(
        rows=2, cols=3,
        subplot_titles=("United Kingdom","England", "Scotland", "Wales", "Northern Ireland"))
    
    fig.add_trace(
        go.Bar(x = t_uk['Reporting date'], y=t_uk['Daily change in deaths']),
        row=1, col=1)
    fig.add_trace(
        go.Bar(x = t_eng['Reporting date'], y=t_eng['Daily change in deaths']),
        row=1, col=2)
    fig.add_trace(
        go.Bar(x = t_sco['Reporting date'], y=t_sco['Daily change in deaths']),
        row=1, col=3)
    fig.add_trace(
        go.Bar(x = t_wal['Reporting date'], y=t_wal['Daily change in deaths']),
        row=2, col=1)
    fig.add_trace(
        go.Bar(x = t_noi['Reporting date'], y=t_noi['Daily change in deaths']),
        row=2, col=2)
    
    #fig.update_layout(showlegend=False, margin=dict(l=20, r=20, t=20, b=20))
    
    fig.update_xaxes(
        nticks=20)
    
    fig.update_layout(
        xaxis_tickformat = '%d/%m',
        showlegend=False,
        title={
            'text': "Daily change in deaths",
            'y':0.95,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'}
    )
    
    fig.add_annotation(x=t_uk.iloc[58, 3],
                y=t_uk.iloc[58, 4],
                xref="x",
                yref="y",
                text="lockdown",
                showarrow=True,
                arrowhead=2,
                ax=-20,
                ay=-40,
                font=dict(
                    color="black",
                    size=11
                )
    )                   
    fig.show()
    

    10.3. Trend line of daily deaths in UK

    The Prime Minister of the United Kingdom announced lockdown measures on March 23. In an effort to analyse the trends of COVID-19 deaths, the LOWESS (locally weighted scatterplot smoothing or locally weighted polynomial regression) and OLS (ordinary least squares) methods were used.

    In [22]:
    # LOWESS trendline
    t_uk_inv = t_uk.iloc[::-1].reset_index()
    t_uk_inv = t_uk_inv[t_uk_inv['Reporting date'] > '2020-03-22']
    fig = px.scatter(t_uk_inv, x=t_uk_inv.index, y="Daily change in deaths", trendline="lowess")
    fig.update_xaxes(
        ticktext=list(t_uk_inv['Reporting date'].dt.strftime('%d/%m')),
        tickvals=t_uk_inv.index,
    )
    fig.update_layout(
        showlegend=False,
        title={
            'text': "Daily change in deaths since the lockdown",
            'y':0.95,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'},
        xaxis = dict(
            title_text = "")
    )
    
    #fig.add_annotation(
    #            x=t_uk_inv.iloc[30, 4],
    #            y=t_uk_inv.iloc[30, 5],
    #            text="lockdown")
    
    fig.show()
    

    The plot shows that from the announced lockdown on March 23 to about April 16 the number of daily deaths had a positive trend line (increasing death toll). This pattern has changed since about April 17 (20 days after the lockdown), in which the trend line has declined (decreasing death toll).

    In [23]:
    # OLS trendline
    fig = px.scatter(t_uk_inv, x=t_uk_inv.index, y="Daily change in deaths", trendline="ols")
    
    fig.update_xaxes(
        ticktext=list(t_uk_inv['Reporting date'].dt.strftime('%d/%m')),
        tickvals=t_uk_inv.index,
    )
    
    fig.update_layout(
        xaxis_tickformat = '%d/%m',
        showlegend=False,
        title={
            'text': "Daily change in deaths since the lockdown",
            'y':0.95,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'},
        xaxis = dict(
            title_text = "")
    )
    fig.show()
    results = px.get_trendline_results(fig)
    results.px_fit_results.iloc[0].summary()
    
    Out[23]:
    OLS Regression Results
    Dep. Variable: y R-squared: 0.017
    Model: OLS Adj. R-squared: -0.000
    Method: Least Squares F-statistic: 0.9956
    Date: Sat, 23 May 2020 Prob (F-statistic): 0.322
    Time: 19:26:21 Log-Likelihood: -432.56
    No. Observations: 61 AIC: 869.1
    Df Residuals: 59 BIC: 873.3
    Df Model: 1
    Covariance Type: nonrobust
    coef std err t P>|t| [0.025 0.975]
    const 692.7297 107.875 6.422 0.000 476.873 908.587
    x1 -2.1446 2.149 -0.998 0.322 -6.445 2.156
    Omnibus: 3.453 Durbin-Watson: 0.603
    Prob(Omnibus): 0.178 Jarque-Bera (JB): 1.816
    Skew: 0.087 Prob(JB): 0.403
    Kurtosis: 2.173 Cond. No. 143.


    Warnings:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

    The trend line suggests a decreasing death toll since the lockdown measure. The slope x1 (-2.1446) represents the average predicted change in y (number of deaths) resulting from one unit increase in x (day).

    The plots suggest that the lockdown measures had a positive impact on reducing the number of daily deaths caused by COVID-19.

    10.4. Daily new cases and deaths in England

    In [24]:
    t_eng_c = df_cases[df_cases['Area name']=="England"]
    
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(
        x=t_eng_c["Reporting date"],
        y=t_eng_c["Daily change in confirmed cases"],
        name='Confirmed Cases',
        text = t_eng_c["Weekday"],
        #marker_color='lightsalmon',
        fill='tozeroy'
    ))
    
    fig.add_trace(go.Scatter(
        x=t_eng["Reporting date"],
        y=t_eng["Daily change in deaths"],
        name='Deaths',
        text = t_eng["Weekday"],
        fill='tozeroy'
    
    ))      
    
    fig.update_layout(legend=dict(
            x=0,
            y=1.0,
            bgcolor='rgba(255, 255, 255, 0)',
            bordercolor='rgba(255, 255, 255, 0)'
            ), yaxis_title="Number of occurrences",
                     margin=dict(l=20, r=20, t=20, b=20),
            title={
            'text': "Daily deaths and confirmed cases",
            'y':0.98,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'},
        xaxis_tickformat = '%d/%m'
    )
    
    fig.update_xaxes(
        #ticktext=list(range(1, len(df_deaths['Reporting date'].unique()))),
        #tickvals=df_deaths["Reporting date"],
        nticks=40,
        rangeslider_visible=True,
        rangeselector=dict(
            buttons=list([
                dict(count=7, label="1w", step="day", stepmode="backward"),
                dict(count=1, label="1m", step="month", stepmode="backward"),
                dict(count=3, label="3m", step="month", stepmode="todate"),
                dict(step="all")
            ])
        )
    )
    
    fig.show()
    

    The plot shows that the number of daily deaths and confirmed cases is decreasing, however, the lines profile are irregular over the days. It seems that there is a different distribution of the cases and deaths over the week. Then, the distribution of the cases and deaths was analysed.

    In [25]:
    weekday_d = t_eng.groupby("Weekday").mean().reset_index()
    cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    weekday_d['Weekday'] = pd.Categorical(weekday_d['Weekday'], categories=cats, ordered=True)
    weekday_d = weekday_d.sort_values('Weekday')
    weekday_d = weekday_d.iloc[:, :-1]
    
    weekday_c = t_eng_c.groupby("Weekday").mean().reset_index()
    weekday_c['Weekday'] = pd.Categorical(weekday_c['Weekday'], categories=cats, ordered=True)
    weekday_c = weekday_c.sort_values('Weekday')
    
    fig = go.Figure()
    fig = make_subplots(rows=1, cols=2)
    
    fig.add_trace(go.Scatter(x=weekday_d['Weekday'], y=weekday_d['Daily change in deaths'],
                             mode='lines+markers', name="mean of deaths"), row=1, col=1)
    
    fig.add_trace(go.Scatter(x=weekday_c['Weekday'], y=weekday_c['Daily change in confirmed cases'],
                             mode='lines+markers', name="mean of confirmed cases"), row=1, col=2)
    
    fig.update_layout(
            title={
                'text': "Mean per weekday of daily deaths and confirmed cases",
                'y':0.93,
                'x':0.42,
                'xanchor': 'center',
                'yanchor': 'top'},
            yaxis_title="Number of occurrences"
    )
    fig.show()
    

    The result highlights an irregular distribution of the cases and deaths over the week. The mean of deaths and confirmed cases on Sundays are the smallest and on Tuesdays are the highest. The pattern observed may be related to the notification procedure not be processed around the weekends.